Product Code Database
Example Keywords: medical -apple $68
barcode-scavenger
   » » Wiki: Mirror Descent
Tag Wiki 'Mirror Descent'.
Tag

In mathematics, mirror descent is an iterative optimization for finding a of a differentiable function.

It generalizes algorithms such as and multiplicative weights.


History
Mirror descent was originally proposed by Nemirovski and Yudin in 1983.Arkadi Nemirovsky and David Yudin. Problem Complexity and Method Efficiency in Optimization. John Wiley & Sons, 1983


Motivation
In with the sequence of learning rates (\eta_n)_{n \geq 0} applied to a differentiable function F, one starts with a guess \mathbf{x}_0 for a local minimum of F, and considers the sequence \mathbf{x}_0, \mathbf{x}_1, \mathbf{x}_2, \ldots such that

\mathbf{x}_{n+1}=\mathbf{x}_n-\eta_n \nabla F(\mathbf{x}_n),\ n \ge 0.

This can be reformulated by noting that

\mathbf{x}_{n+1}=\arg \min_{\mathbf{x}} \left(F(\mathbf{x}_n) + \nabla F(\mathbf{x}_n)^T (\mathbf{x} - \mathbf{x}_n) + \frac{1}{2 \eta_n}\|\mathbf{x} - \mathbf{x}_n\|^2\right)

In other words, \mathbf{x}_{n+1} minimizes the first-order approximation to F at \mathbf{x}_n with added proximity term \|\mathbf{x} - \mathbf{x}_n\|^2.

This squared Euclidean distance term is a particular example of a . Using other Bregman distances will yield other algorithms such as /ref>


Formulation
We are given convex function f to optimize over a convex set K \subset \mathbb{R}^n, and given some norm \|\cdot\| on \mathbb{R}^n.

We are also given differentiable convex function h \colon \mathbb{R}^n \to \mathbb{R}, \alpha-strongly convex with respect to the given norm. This is called the distance-generating function, and its \nabla h \colon \mathbb{R}^n \to \mathbb{R}^n is known as the mirror map.

Starting from initial x_0 \in K, in each iteration of Mirror Descent:

  • Map to the dual space: \theta_t \leftarrow \nabla h (x_t)
  • Update in the dual space using a gradient step: \theta_{t+1} \leftarrow \theta_t - \eta_t \nabla f(x_t)
  • Map back to the primal space: x'_{t+1} \leftarrow (\nabla h)^{-1}(\theta_{t+1})
  • Project back to the feasible region K: x_{t+1} \leftarrow \mathrm{arg}\min_{x \in K}D_h(x||x'_{t+1}), where D_h is the Bregman divergence.


Extensions
Mirror descent in the online optimization setting is known as Online Mirror Descent (OMD).


See also

Page 1 of 1
1
Page 1 of 1
1

Account

Social:
Pages:  ..   .. 
Items:  .. 

Navigation

General: Atom Feed Atom Feed  .. 
Help:  ..   .. 
Category:  ..   .. 
Media:  ..   .. 
Posts:  ..   ..   .. 

Statistics

Page:  .. 
Summary:  .. 
1 Tags
10/10 Page Rank
5 Page Refs